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Abstract 

We present an algorithm for detecting a specified set 
of targets for an Automatic Target Recognition 
(ATR) application. ATR involves processing 

images for detecting, classifying, and tracking targets 
embedded in a background scene. We address the 
9 problem of discriminating between targets and non- 
target objects in a scene by evaluating 40x40 image 
blocks belonging to an image. Each image block is 
first projected onto a set of templates specifically 
designed to separate images of targets embedded in a 
typical background scene from those background 
images without targets. These filters are found using 
directed principal component analysis which 

maximally separates the two groups. The projected 
images are then clustered into one of n classes based 
on a minimum distance to a set of n cluster 
prototypes. These cluster prototypes have previously 
been identified using a modified clustering algorithm 
based on prior sensed data. Each projected image 
pattern is then fed into the associated cluster’s 
trained neural network for classification. 

A detailed description of our algorithm will be given 
in this paper. We outline our methodology for 
designing the templates, describe our modified 
clustering algorithm, and provide details on the 
neural network classifiers. Evaluation of the overall 
algorithm demonstrates that our detection rates 
approach 96% with a false positive rate of less than 

0. 03%. 

1. Introduction 

There has been much work involved in the process 
of automatic target recognition (ATR). This process 
involves automatic detection, classification, and 
tracking of a target located, or camouflaged, in an 
image scene. The typical procedure utilized for 
recognition involves a three-stage process - 
segmentation, feature extraction, and classification. 
The segmentation process is useful for dividing the 
image space into separate regions of interest. The 
feature extraction process allows the ATR system to 
identify and classify targets based on relevant 


features and the classification process involves 
detecting and consequently identifying the target in 
question. 

An ATR system must be invariant toward vantage 
points. This includes illumination changes, 
shadowing, perspective distortion, and occlusion. In 
Aerial ATR applications, the input image is typically 
an on-line aerial image acquired by digital camera. 
Such real world imagery is affected by climate, 
season, weather, and time of day. An aerial image is 
also subject to geometric changes, such as position, 
orientation, and scale variations. There are many 
other problems which face ATR systems. Normally, 
the target recognition process is highly data 
dependent. Most systems are only capable of 
recognizing a pre-specified number of targets and 
are unable to expand their object database. In 
addition, many ATR systems are encoded with 
predetermined tolerances resulting in a tendency to 
be very sensitive to scale and orientation changes. 
MODALS [11], a 3-D multiple object detection and 
location system, utilizes a neural network to 
simultaneously segment, detect, locate, and identify 
multiple targets. Although MODALS is able to 
provide robust detection, high classification, and a 
low false alarm rate, it is not rotation or scale 
invariant. SAHTIRN [1] performs automatic target 
recognition through a three-stage process using an 
edge detector, a multi-layer feedfoward clustering 
neural network and a neural network classifier. 
SAHTIRN is able to successfully classify objects 
with varying scale and orientation parameters, but is 
not robust when faced with changes in lighting 
conditions. Greenberg and Guterman[5] use neural 
networks to address the issue of target classification, 
but assume the ATR-detection process has 
previously executed and has already identified 
targeted regions of interest. 

Our research objective is to develop a novel 
technique which autonomously detects, in real time, 
all target objects embedded in a background image 
scene. The evaluation of these algorithms is based 
on inserting target images into real scenes acquired 
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from video input. In real time, we will reduce the 
data dimensionality of a scene using an optimal set 
of templates and spatially locate targets in the scene 
with a neural network classifier. Figure 1 provides 
an overview of our approach for detecting a knovfai 
set of targets in a background image. The rest of this 
paper describes the methodology used to investigate 
autonomous target detection in detail. 



Figure 1: The data processing path for each 40x40 image 
block extracted from the acquired video input The image 
block is projected onto a set of filters, associated with a 
particular cluster, and then classified with the associated 
neural network. 


2. Technique 


A. Background and Target Data Set 

The background image scenes used in this research 
effort are acquired from video camera from the JPL 
in Pasadena site. We segment these background 
images into 40x40 image blocks for input into our 
algorithm (Fig. 2:Top). Target objects (Fig. 
2:Middle) are modeled from an actual cruise missle 
and represent various scale and rotation perspectives 
of the missle. These synthetic target objects are used 
for training the algorithm and are embedded into the 
background image such that: 
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where t xy is the pixel intensity value of the target 
image at (x,y), b xy is the pixel value of the 
background image at (x,y), and I T is the embedded 
target-background image block. Example embedded 


target-background images are shown in Figure 
2:Bottom. 

Once we extract our background and embedded 
target data set, we perform a preprocessing step in 
order to account for time-of-day lighting variations 
in the image set. We subtract the average image 
block intensity value from each pixel such that: 



where i xy is the image intensity of pixel (x,y) in 
image block I, N is the size of image block I 
(40x40), and F is the corrected image block to be 
used in our algorithm. 

Using this data set, we can train an algorithm 
capable of intelligently detecting a target embedded 
within a background image scene. The next section 
describes our approach for the development of such 
an algorithm. 






Figure 2. Top Row: Background Images; Middle 
Row: Target Objects; Bottom Row: Background 
Images with Embedded Targets 


B. Algorithm Description 

Given a set of targets T, the goal of the algorithm is 
to detect in real time, any target t e T present in a 40 
x40 image block extracted from a background scene. 
After the data preprocessing step, we begin by 
projecting an image block onto a set of templates 
specifically designed to separate signatures derived 
from a target embedded in a background image from 
other typical background images. These projections, 
or patterns, are then clustered into one of n classes 
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based on their distance to a set of n cluster 
prototypes. These cluster prototypes have previously 
been identified using a modified clustering algorithm 
based on prior sensed data. Associated with each 
cluster is a trained neural network classifier. After 
clustering, the projected image pattern is fed through 
this associated trained neural network for detection. 

In order to accomplish our target detection goal, 
prior knowledge must be derived through the 
following algorithmic preprocessing steps: 

i. Derive a set of linear filters used to 
optimally separate targets embedded in a 
background image from other background 
images. 

ii. Identify a set of cluster prototypes used to 
classify the projected image patterns. 

iii. Train a set of expert neural network 
classifiers for each cluster which responds 
with 1 when fed embedded target- 
background image patterns and -1 
otherwise. 

i. Linear Filter Sets 

The filtering step involves an orthogonal sub-space 
projection of each image block. It is used to 
optimally linearly separate the embedded target 
background images from those images without 
targets. This is a standard technique used to reduce 
the dimensionality of the image block [from 1600 
(40x40) to 17 dimensions] while preserving as much 
of the signal as possible. The filters associated with a 
given prototype are derived from the distribution of 
a background image (noise) and the distribution of 
potential targets embedded in that background 
(signal). This can be optimally separated to 
maximize the signal to noise ratio between the two 
groups using directed principal components analysis 
(DPCA). To characterize the distribution for the 
background image, the covariance matrix, Rj, is 
found for image blocks which do not contain targets. 
We characterize the mixed target-background image 
distribution instances by its covariance matrix, Sj. 

We are interested in finding a set of orthogonal basis 
vectors Wi, that maximizes the expected signal to 
noise ratio of these two distributions defined by their 
respective image sets. The generalized eigenvector 
solution: 

Si Wi = A, R, Wi 


accomplishes this. The set of filters defined by W* is 
the directed components used in our algorithm. They 
essentially steer the eigenvector solution away from 
dimensions of high noise variance in a linearly 
optimal fashion. Figure 3 shows a subset of the 
linear filter set. 



Figure 3. Top 15 directed principle components 


ii. Clustering 

To effectively simplify the distribution of data 
classified by an expert neural network, we partition 
the incoming projected image patterns drawn from a 
known distribution of background and embedded 
target-background images into a number of 
predetermined groups by using the prototypes Pj of a 
clustering algorithm. The clustering algorithm is run 
on previously acquired data that reflects the 
distribution of the scene being analyzed. 

The clustering algorithm employed is a modified 
version of a standard clustering technique outlined in 
Duda and Hart [3]. The standard algorithm uses a 
standard least squares criterion to minimize the 
distance between each of n randomly selected 
groups. The criterion minimized by the standard 
clustering algorithm is: 

(1) cost = !£; || Pj - Pill 

where i is one of n clusters and pj is a projected 
image pattern in that cluster. The clustering 
algorithm iterates through each projected image 
pattern and determines if moving the pattern to 
another group reduces the overall cost. If it does, the 
pattern is moved to the other group and the 
associated averages of each prototype cluster are 
recalculated. This continues until the moving of 
patterns no longer reduces the overall cost. The 
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resultant cluster prototypes are then employed by our 
algorithm to segment the scene. 

As n increases, the overall cost is likely to go down 
as a larger number of groups allow the clustering 
algorithm to better fit the given distribution. 
Secondly, the clustering algorithm, as described, is 
independent of the variation in the clustering set. It 
does not take into account any information that we 
might have concerning the patterns already 
belonging to the cluster. What we need is a way to 
penalize a cluster when adding a pattern that is 
different than the majority of patterns already in the 
cluster. This, in effect, will allow the clustering 
algorithm to more likely group embedded target- 
background patterns together while still discarding 
those background patterns which may have similar 
characteristics. To accomplish this, we modified the 
criterion given in (1) to reflect this knowledge. The 
change in (1) consists of simply weighting the initial 
criterion by a term reflecting variance. The modified 
criterion is given by: 

(2) cost = Xi (l+wj) Xj|| pj - P, || 

where 

nt i „ nt 
w = — -(1 -) 

np , np i 

ntj is the number of embedded target patterns in 
cluster i, and npj is the number of background 
patterns in cluster i. Patterns that are not like those 
already in the cluster will be weighted more in the 
cost of the clustering algorithm than those alike thus 
allowing the clustering algorithm to naturally link 



Figure 4. A Segmented Background Image Scene 


alike elements together. Figure 4 shows a 
background scene segmented with the derived 
clustering prototypes. 

iii. Classification 


The last step in our preprocessing algorithm involves 
classifying each projected image pattern belonging 
to cluster with a neural network [6]. The networks 
are trained with data drawn from the two 
distributions: background patterns R* and embedded 
target-background patterns S*. The expert network 
for class i is required to respond with 1 for elements 
drawn from S* and -1 from those drawn from R*. We 
use a simple feed forward network model employing 
17 inputs and 10 sigmoidal hidden units trained with 
backpropogation to get the desired result. The output 
can then be thresholded to achieve the desired 
detection rate or false positive rate by examining the 
receiver operator curves. 

C. Algorithm Implementation 

After we implement the preprocessing steps, we can 
perform real-time intelligent target detection. After 
subtracting out the mean, each image block is 
projected onto a linear filter set. This projected 
image pattern is then compared to the set of pre- 
computed cluster prototypes. Based on the Euclidean 
distance, the pattern is grouped with the closest 
prototype. 

We then use the trained neural network classifier to 
evaluate whether or not the image pattern contains a 
target from T. The neural network for each cluster 
group takes as input the projected values of the 
image and outputs a value. Values above a threshold 
are considered images with targets and those below 
are assigned to background. 

The effectiveness of the evaluation requires that the 
cluster prototypes generated and the image blocks 
used in training the classifier must be derived from 
scenery with roughly the same distributions as 
encountered in the operational test. The following 
pseudo code outlines the important features of the 
algorithm. 

Let I xy be an image block with a centroid located at 
(x,y) in the scene: 

for all x,y- 

1. i xy *rw 

2. AV p xy 

3. for min( H/>x y - P. II ) 

4. if V , (p xy ) < tfiTj then target(x y y) 
otherwise backgrounds y) 

where I c is the mean corrected image block, W is the 
linear filter set, p xy is the projected image block, Pj is 


4 


the closest cluster prototype, N s is the neural 
network classifier associated with cluster i, and thr t 
is the threshold value discriminating between images 
with targets and those without. 

3. Results 

We evaluated the overall performance of the 
algorithm using the described target set and 
background images. 

The background scenes consisted of over one million 
image blocks of which less than 5% were used in 
developing a set of training data. Testing data 
consisted of randomly drawn image blocks from the 
f background scenes. Embedded target images were 
generated by randomly selecting images from the 
target set and mixing them with arbitrary background 
image blocks. A sub-sample of the training data 
(1000 examples each) was used to generate the 
covariance matrixes R and S. The generalized 
eigenvector solution W was then solved using 
Matlab. The training data was then projected onto 
the filter set and evaluated with the clustering 
technique to realize the cluster prototypes (P) used 
in step i of the algorithm. 

Training data for the neural network was again 
drawn from the set of training image blocks. In 
addition, a portion of the training data for the 
network was used to halt training (a hold out set) as 
described in Hay kin [6], Training of the networks 
used 80,000 examples, Vi target and Yi background 
images. The hold out set consisted of 40,000 
examples not trained upon. It is used to stop training 
in order to prevent over learning on the training data 
which tends to decrease generalization. 



Figure 5: Detection vs False Positive Rate 



Figure 6: Detection Output 


Our results give us a detection rate of 96% with a 
false positive rate of less than 0.03%. These results 
are constructed with 100,000 novel 40x40 image 
blocks. Figure 5 shows our Receiver Operator Curve 
(showing Detection vs. False Positive Rate) and 
Figure 6 shows an example of the detection output. 

4, Conclusion 

A novel detection algorithm and our evaluation 
methodology are described here. The detection 
algorithm was shown to perform detection at a rate 
of 96% with false positives less than 0.03% on a set 
of targets mixed with background images. 
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